OpenAI announces “HealthBench” to evaluate medical AI models. LLM > specialist doctors, but there is almost no difference in scores between “LLM alone” and “doctor + LLM”...

Update: 2025-06-02

Description

This week in medical news, OpenAI announced HealthBench, a new way to evaluate medical AI models that suggests large language models (LLMs) may soon surpass specialists, though LLM-supported doctors currently perform similarly to standalone LLMs. In another development, AI Scientist, a multi-agent system, discovered a promising drug candidate for a major cause of blindness, demonstrating a closed-loop AI approach to scientific discovery. Simultaneously, a "Don't Die" movement is gaining traction in Silicon Valley focused on radical life extension and utilizing services like genetic testing kits, while traditional weight loss programs are facing challenges with the bankruptcy of WW International amidst the rise of GLP-1 medications.

Comments

In Channel

U.S. EHR Leader Epic Evolves Electronic Health Records with AI—CoMET and Three New Assistants Transform the Future of Clinical Care

2025-09-2214:41

Apple adds hypertension risk alert feature to Apple Watch. An algorithm reviews data from the past 30 days in the background and sends a notification if it detects a risk of high blood pressure.

2025-09-1620:51

The Effectiveness of Health Apps in Specific Health Guidance: Do Health Apps Really Improve Retention Rates and Test Values?

2025-09-0815:19

Fujitsu has announced an orchestrator platform that leverages NVIDIA technology to integrate and manage multiple AI agents for healthcare settings. Will AI agent development companies get on board?

2025-09-0116:37

Eight Sleep, which offers smart sleep mattresses, has raised $100 million in Series D funding. The mattress costs $5,800 plus an annual fee of $399, which is quite expensive, but does it really work?

2025-08-2516:22

CareNet to be taken private through a tender offer by PE fund Curie1 Co., Ltd. Following MedPeer, will m3 become the sole leader in medical doctor PFs?

2025-08-1813:36

According to a survey by the Japan Medical Association, more than half of clinics say that electronic medical records are “impossible to implement.”

2025-08-1212:32

[M3 (2413)] 2025 Q1 Financial Results. Signs of a turnaround seen in M3's Q1 financial results: Recovery of the physician platform business and the current status of the medical AI ecosystem strategy.

2025-08-0610:12

Advantage acquires Nippon Pharmacy. Plans to delist shares for just under 100 billion yen. Pharmacy business falls into the red amid intensifying competition. Specialized pharmacies facing headwinds?

2025-08-0416:14

Neuralink's Future: From Games to Telepathy and Vision Restoration

2025-07-2812:19

Approval granted for insurance coverage of 3 medical devices, including an app to assist with alcohol reduction (CureApp). Is the reimbursement price of $50 for the alcohol reduction app reasonable?

2025-07-2212:29

Samsung acquires Xealth, a digital health platform that provides electronic medical record APIs. The battle for wearable devices and the battle for data acquisition are both heating up.

2025-07-1413:44

OPERE, operator of medical communication SaaS “Pokesapo,” raises 400 million yen in pre-Series A funding. How does it differ from patient explanation support services such as Contrea and MediOS?

2025-07-0716:10

The most toxic substances for the liver are sweetened beverages such as 100% fruit juice, vegetable juice, and energy drinks. Solid foods are OK, but liquids are not?

2025-06-3017:03

Fujifilm uses AI to generate diagnostic findings, reducing the burden by 30 million cases per year. Is it true that humans are necessary to take responsibility?

2025-06-2312:58

Kakehashi completes Series D round of financing worth approximately 140 million dollars. Musubi has been installed in 14,000 locations, likely surpassing EM Systems to become the industry leader

2025-06-1613:03

Kirin's AI prediction application “Premedi” wins second Japan New Business Award. Two approaches: AI demand prediction for long-tail pharmaceuticals and loss minimization through delivery and buyback.

2025-06-0918:44

OpenAI announces “HealthBench” to evaluate medical AI models. LLM > specialist doctors, but there is almost no difference in scores between “LLM alone” and “doctor + LLM”...

2025-06-0217:20

How far can digital therapeutics go? Differences between Japan and the US in DTx regulations, insurance reimbursement, and penetration rates.

2025-05-2914:29

Medley 2025 Q1 Results. New Job Medley Spot Launched and Generation AI Utilization is One of the Pillars of Strategy in Medical PF

2025-05-1412:30

00:00

1.0x

OpenAI announces “HealthBench” to evaluate medical AI models. LLM > specialist doctors, but there is almost no difference in scores between “LLM alone” and “doctor + LLM”...

#box-pro-ellipsis-175882322863173{-webkit-line-clamp:2;}OpenAI announces “HealthBench” to evaluate medical AI models. LLM > specialist doctors, but there is almost no difference in scores between “LLM alone” and “doctor + LLM”...

OpenAI announces “HealthBench” to evaluate medical AI models. LLM > specialist doctors, but there is almost no difference in scores between “LLM alone” and “doctor + LLM”...

Kazutaka Yoshinaga

OpenAI announces “HealthBench” to evaluate medical AI models. LLM > specialist doctors, but there is almost no difference in scores between “LLM alone” and “doctor + LLM”...